-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate TaskRun compatibility with the Affinity Assistant #2885
Validate TaskRun compatibility with the Affinity Assistant #2885
Conversation
@jlpettersson: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This PR cannot be merged: expecting exactly one kind/ label Available
|
The following is the coverage report on the affected files.
|
6ef27af
to
8fa78cc
Compare
The following is the coverage report on the affected files.
|
/release-note-none |
} | ||
|
||
if pvcWorkspaces > 1 { | ||
return fmt.Errorf("TaskRun mounts more than one PersistentVolumeClaim - that is forbidden when the Affinity Assistant is enabled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be "mounts more than one writeable PVC"? I thought it was OK to receive multiple PVCs in read-only? Or will that also trip into potential deadlock scenarios?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good question @sbwsg
TLDR; as the message is written - it matches the current implementation of the Affinity Assistant.
The Affinity Assistant makes so that TaskRun pods are scheduled to the same Node. This actually helps to avoid two different problems:
- All pods sharing a PVC (e.g. with accessMode ReadWriteOnce) can now execute in parallel - this may be possible with other accessModes e.g. ReadWriteMany even without the Affinity Assistant.
- All pods sharing a PVC (e.g. with a zonal StorageClass) is scheduled to the same AZ, so it will not deadlock the pipeline - this may be possible with regional StorageClasses even without the Affinity Assistant - but on e.g. GCP regional storageClass volumes is only available on two of three AZs - so it may still be a problem. If using accessMode ReadWriteMany pods can execute in parallel within an AZ - but for regional clusters they additionally need to be regional StorageClass to avoid the problems that the Affinity Assistant solves.
The guidelines about "at most one writeable workspace" is for Tasks - e.g. they don't say if it is a PVC or emptyDir or regional StorageClass. With "writeable" I also meant PVCs - e.g. using a Secret or ConfigMap-workspace is still fine for Tasks - in addition to a PVC workspace.
I may add a new section to the workspace documentation to document more about this - it is a bit complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand - using a ReadWriteOnce
in readOnly mode and regional storageClass - should work - but is currently not supported/implemented by the Affinity Assistant. We may want to implement it - but it is a bit complicated too - and probably a rarely used use case.
/approve |
pkg/reconciler/taskrun/taskrun.go
Outdated
if err := validateWorkspaceCompatibilityWithAffinityAssistant(tr); err != nil { | ||
logger.Errorf("TaskRun %q workspaces are invalid: %v", tr.Name, err) | ||
tr.Status.MarkResourceFailed(podconvert.ReasonFailedValidation, err) | ||
return nil, nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a PermanentError
otherwise the controller will requeue this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, it looks good - only one issue, a permanent error should be returned instead.
One of the matrix uni tests checks for all invalid situations, and enforces the permanent error. I wonder if we could use that test instead of creating a new one?
I'm going to add this to the 0.15 milestone as it would be great to get the validation in place in time for the next release. |
I'm also happy to pick this up @jlpettersson if you don't have time to finish. let me know! |
8fa78cc
to
48d704a
Compare
The following is the coverage report on the affected files.
|
@sbwsg I fixed the Thanks @sbwsg and @afrittoli for review! |
Awesome, this looks great to me as it is but I'll leave it to @afrittoli to give the PermanentError another look. |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. The change looks good, but unfortunately you'll need to rebase first.
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: afrittoli, sbwsg The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
A TaskRun that mount more than one PVC-backed workspace is incompatible with the Affinity Assistant. But there is no validation if the TaskRun is compatible - so the TaskRun Pod is stuck with little information on why to the user. This commit adds validation of TaskRuns. When a TaskRun is associated with an Affinity Assistant, it is checked that not more than one PVC workspace is used - if so, the TaskRun will fail with a TaskRunValidationFailed condition. Proposed in tektoncd#2829 (comment) Closes tektoncd#2864
48d704a
to
1cd4d8c
Compare
The following is the coverage report on the affected files.
|
/hold cancel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/test pull-tekton-pipeline-integration-tests |
1 similar comment
/test pull-tekton-pipeline-integration-tests |
Changes
A TaskRun that mount more than one PVC-backed workspace is incompatible
with the Affinity Assistant. But there is no validation if the TaskRun
is compatible - so the TaskRun Pod is stuck with little information on why
to the user.
This commit adds validation of TaskRuns. When a TaskRun is associated with
an Affinity Assistant, it is checked that not more than one PVC workspace
is used - if so, the TaskRun will fail with a TaskRunValidationFailed condition.
Proposed in #2829 (comment)
Closes #2864
/kind feature
/release-note-none
Submitter Checklist
These are the criteria that every PR should meet, please check them off as you
review them:
See the contribution guide for more details.
Double check this list of stuff that's easy to miss:
cmd
dir, please updatethe release Task to build and release this image.
Reviewer Notes
If API changes are included, additive changes must be approved by at least two OWNERS and backwards incompatible changes must be approved by more than 50% of the OWNERS, and they must first be added in a backwards compatible way.